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ABSTRACT 


The chloroplast gene atpB was sequenced for seven genera of the Lardizabalaceae and three outgroup taxa to 
assess its utility as a source of phylogenetic information. The resulting phylogenetic tree was compared with trees 
based on 18S nuclear ribosomal DNA and rbcL (chloroplast DNA) sequences, as well as a combination of all data 
(atpB, 18S, and rbcL) for the same taxa. Sequence divergence values, statistics related to patterns of character 
transformation, and indices measuring homoplasy and branch support were also compared. The topology of the trees 
derived from atpB, 18S, and a combination of all three sequence data sets were largely congruent. All phylogenies, 
with the exception of the tree derived from rbcL data, supported the monophyly of the Lardizabalaceae. All indicators 
of nucleotide substitution rate suggest that rbcL is the least conserved, atpB is intermediate, and 185 is the most 
conserved of the three genes sequenced. Measures of homoplasy also indicate that the rbcL tree is less strongly 


supported than those based on atpB, 18S, or a combination of atpB, 18S, and rbcL sequence data. 





Phylogenetic analyses of higher-level plant 
groups using DNA sequence data have been based 
most often on the chloroplast gene rbcL or, less 
frequently, on 18S nuclear ribosomal DNA (185 
nrDNA). Few phylogenetic studies have used other 
gene sequences across a broad range of taxa, and 
still fewer have compared results from two or more 
gene sequences for the same taxa (e.g., Baldwin, 
1992; Johnson & Soltis, 1994; Olmstead & Sweere, 
1994). As part of an intensive systematic study of 
phylogenetic relationships among basal eudicots 
(Ranunculidae and “flower” Hamamelididae), we 
have further developed the chloroplast gene, atpB, 
as a new source of phylogenetically informative 
data (Ritland & Clegg, 1987). Here, we apply the 
atpB gene to resolve phylogenetic relationships in 
the angiosperm family Lardizabalaceae and com- 


pare the results with those based on rbcL and 185 
nrDNA for the same taxa. Cladograms based on 
the three genes are evaluated and compared in 
terms of their resolution and congruence, as well 
as various measures of phylogenetic signal, tran- 
sition/ transversion bias, sequence divergence, and 
homoplasy. Based on these data, we discuss the 
effectiveness of using each gene for phylogenetic 
studies at the generic level and above. 

The Lardizabalaceae (Ranunculidae, sensu 
Takhtajan, 1987—‘ranunculids’’) are a family of 
twining (rarely erect) shrubs found in temperate 
areas of Eastern Asia and South America. The 
family is characterized by alternate, palmate (rare- 
ly pinnate) compound leaves; regular unisexual 
flowers; six overlapping or valvate sepals (three in 
Akebia); staminodia or petals small or absent; three 
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FicurE 1. Location of the atpB gene in relation to a portion of the tobacco chloroplast genome. Top numbers 
indicate tobacco coordinate units x 1000 (TCU; Shinozaki et al., 1986). Line below the TCU scale maps a portion 
of the single copy region and inverted repeat of the tobacco chloroplast genome from TCU 40,000 to TCU 90,000. 
Black bar at approximately 87,000 TCUs is the beginning of the inverted repeat. Genes above the tobacco genome 
map (Shinozaki et al., 1986) are transcribed from left to right; those below the line are transcribed from right to left. 
The bold line beneath the tobacco genome map illustrates the bracketed region above in more detail. Filled in arrows 
indicate the location and direction of the amplification primers for the atpB gene. Hollow arrows indicate the location 


and direction of the internal sequencing primers. 


carpels (up to nine in Akebia); numerous ovules 
(four, by abortion, in Boquila) with laminar pla- 
centation (submarginal in Decaisnea апі Sinof- 
ranchetia). 

Previous phylogenetic analyses based on rbcL 
sequence data suggest that the Lardizabalaceae 
occupy a key and potentially basal position in the 
evolution of the Ranunculidae, but the family was 
represented by only the genus Akebia (Chase et 
al., 1993). This molecular study was undertaken 
to clarify intergeneric relationships in the family, 
to test the results of previous morphologically based 
cladistic analyses (Loconte & Estes, 1989), and to 
provide a more secure basis for representing the 
family in our ongoing investigation of basal eudicot 
radiation. Of particular interest are the phyloge- 
netic positions of Decaisnea and Sinofranchetia 
(the only Lardizabalaceae genera with sub-marginal 
placentation) with respect to the other genera and 
the relationships of the two dioecious South Amer- 
ican genera, Boquila and Lardizabala, the only 
representatives of the family outside eastern Asia. 


THE atpB GENE 


The atpB gene is located in the large single- 
copy region of the chloroplast genome contiguous 
with the atpE gene and downstream from the rbcL 
gene, from which it is separated by an approxi- 
mately 900 bp intergenic spacer region (Fig. 1). 
The atpB gene encodes the 8 subunit of ATP 


synthase (other subunits are encoded in either the 
chloroplast or the nuclear genomes). ATP synthase 
has a highly conserved structure that couples pro- 
ton translocation across membranes with the syn- 
thesis of ATP (Zurawski et al., 1982; Gatenby et 
al., 1989). Previous to this study, the chloroplast 
atpB gene had been sequenced for approximately 
ten genera, representing a diverse range of plants 
(e.g., Chlamydomonas, Marchantia, Spinacia, 
Nicotiana, and Oryza). 

Many features of the atpB gene suggest that it 
may be valuable for comparative sequence studies 
at higher taxonomic levels. It is short enough (1497 
bp) for ease of sequencing but long enough to be 
potentially phylogenetically informative, given 
broadly comparable rates of evolution to rbcL. The 
evolutionary rate is conserved and K, (a measure 
of the rate of synonymous nucleotide substitution 
in the gene; Li et al., 1985; Wolfe, 1991) between 
rice and tobacco is 0.62, indicating a rate of evo- 
lution very similar to that found for rbcL (К, = 
0.63; Wolfe, 1991). There are no reported inser- 
tions and deletions in the atpB gene, the gene does 
not contain introns, and atpB sequences are readily 


aligned. 
MATERIALS AND METHODS 


DNA EXTRACTION AND AMPLIFICATION 


The seven genera of Lardizabalaceae and three 
outgroup genera used in this study are indicated 
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Location (position based on atpB sequence for Spinacia, Zurawski et al., 1982) and base composition 


of amplification and internal sequencing primers for the chloroplast gene, atpB. All primers were designed by S. 


Hoot. 





RBCL1 


52 


51494R 


$20 
$335 
S611 
$1022 
91277 


526 
5385К 
5766К 


Amplification primers 
5 GAA ТСС AAC АСТ TGC TTT АСТ СТС T 
(for amplification of the spacer region between atpB and rbcL as well as 
the atpB gene) 
5' TAT GAG AAT CAA TCC TAC TAC TTC T 
(amplifies the atpB gene only) 
3' TCA GTA CAC AAA GAT TTA AGG TCA T 


Internal sequencing primers, forward direction 


5' CTT СТС ATC СТС GGG TTT CCA CAC Т 
5' ACG TGC TTG GGG АСС СТС ТТС ATA A 
5' AAC GTA CTC GTG AAG GAA ATG ATC T 
5’ CGA САТ ТТС САС ATT TAG ATG СТА С 
5' AAA ТТС AGC GTT TCT TAT CAC AAC С 


Internal sequencing primers, reverse direction 


5' AGA AGT AGT AGG ATT GAT TCT CAT A 
5' GCG CAG ATC TAT GAA TAG GAG ACG T 
5' TAA CAT CTC GGA AAT ATT CCG CCA T 


51186R 
51494К 


5' ТСТ CCT САА СТТ CTT ТСТ AAC GTT С 


same as amplification primer above 





in Table 1, along with sequencing, accession, and 
voucher information. Total cellular DNA was iso- 
lated from fresh, silica-dried, or herbarium material 
according to the miniprep method of Doyle & 
Doyle (1987). In some cases (Boquila trifoliata, 
Sinofranchetia chinensis, Stauntonia hexaphyl- 
la), DNA was further purified and concentrated 
after extraction using GeneClean (GeneClean, Bio 
101, Inc.). 

A segment of double-stranded DNA containing 
most of the coding sequence (approximately 1474 
bp) for the atpB gene was amplified using the 
polymerase chain reaction (PCR). Three amplifi- 
cation primers were designed using atpB sequences 
available from GenBank for spinach, tobacco, pea, 
sweet potato, maize, and wheat. Two alternate 25- 
mer 5’ primers were used. One is located at the 
5' end of the rbcL coding sequence (nucleotide 
positions 15—39 in tobacco). It has the opposite 
orientation from rbcL but the same orientation as 
atpB, and includes the intergenic spacer region 
between rbcL and atpB in the amplification prod- 
uct (Table 2, Fig. 1). The other primer begins with 
the first base upstream of the atpB start codon 
and includes the first 24 bases of the atpB coding 
sequence (positions l-24 in spinach). The З’ 25- 
mer amplification primer, S1494R, is located at 
the 5’ end of the adjacent gene, atpE (positions 
1-25 of the atpE gene in spinach), and has the 
opposite orientation as the atpB gene (Table 2). 

Two alternative protocols differing only in MgCl, 


concentration and annealing temperature were used 
to amplify atpB. In the first protocol, the reaction 
mixture contained (either final concentrations or 
amounts іп a 100 ш reaction): 10 mM Tris-HCl, 
pH8.3, 50 mM KCI, 1.5 mM MgCl,, 0.2 mM of 
each deoxyribonucleotide triphosphate (dNTP), 0.5 
ИМ of each amplification primer, 2.5U Taq Роју- 
merase, 0.3-2.0 uL template DNA (depending on 
concentration). To prevent evaporation during 
thermal cycling, a drop of mineral oil was added 
to each reaction mixture. The sample was then 
placed in a thermocycler (M. J. Research, Inc., 
Cambridge, Massachusetts) with the following cy- 
cling parameters: premelt at 92°C for 3 min.; 30 
cycles, each consisting of a denaturation step at 
92°C for 1 min., annealing step at 55°C for 1 min., 
and an extension step at 72°C for 3 min.; followed 
by a final extension step of 72°C for 7 min. The 
alternative protocol included the following modifi- 
cations: the concentration of MgCl, was doubled 
and the annealing step temperature lowered to 
50°C. In most cases one of these two protocols 
produced amplification product. In those cases 
where yield was still very weak, further amplifi- 
cation directly from the Gene-Cleaned amplification 
product often produced increased yields. 

In most cases, the amplification primers for the 
chloroplast gene rbcL were those described in Olm- 
stead et al. (1992). In Sinofranchetia, no ampli- 
fication product could be obtained using these prim- 
ers. Substituting a З’ primer located at position 
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1300 in the tobacco rbcL sequence and using an 
18S amplification protocol (Nickrent, 1993) re- 
sulted in high yields (Table 1). 18S nrDNA was 
amplified by using either the primers and protocols 
of Nickrent (Nickrent, 1993; Nickrent & Starr, 
1993) or those of Hamby et al. (1988). 

Particular attention was given to purification of 
PCR products to avoid superimposed sequences 
that can result from priming by the amplification 
primers as well as the internal sequencing primer 
during double-standed sequencing. Samples were 
run on a 2% low melt agarose gel (NuSieve GTG) 
with 1% TAE buffer and ethidium bromide, bands 
were visualized by means of UV illumination, then 
removed as gel plugs. To remove agarose and con- 
centrate the PCR product, gel plugs were melted 
at 65°C for approximately 10 min., then further 
purified and concentrated with glass milk 
(GeneClean). This procedure proved especially im- 
portant to obtain high-quality sequences for rbcL 
and 185 nrDNA but amplification products for 
atpB frequently gave excellent results when se- 
quenced after using only the GeneClean purifica- 
tion process. 


DOUBLE-STRANDED SEQUENCING 


The purified dsPCR product was sequenced di- 
rectly with the dideoxy-termination method and 
Sequenase T7 DNA polymerase (US Biochemical) 
using the protocol of Thien (1990) with the fol- 
lowing modifications: the addition of 1% acetamide 
to the annealing reaction and an incubation tem- 
perature of 46-47°С for the termination step. In- 
ternal sequencing primers for atpB are shown in 
Table 2 and Figure 1. Sequencing of rbcL used a 
combination of internal primers, kindly provided 
by G. Zurawski (DNAX Research Institute, Palo 
Alto, California) and a few primers designed spe- 
cifically for this study (positions and sequences 
available from SBH). The 18S internal primers 
used to date were generously furnished by D. Nick- 
rent (Nickrent & Starr, 1993) ог Е. A. Zimmer 
(Hamby et al., 1988). 

Aliquots of the sequencing reactions were loaded 
on two 60 x 33 ст field-gradient 6% polyacryl- 
amide gels and subjected to electrophoresis over- 
night (short run— 600V, long run—1100V). After 
transferring to 3MM Whatman paper, the gels 
were vacuum-dried for approximately one hour at 
80°C and exposed to autoradiography film for 1- 
3 days. Typical autoradiographs from gels run in 
this manner yielded 300 to 350 readable bases. 
All sequences used in this study are available from 
GSDB or directly from the senior author. Consis- 


tently, atpB proved the easiest to sequence of the 
three genes examined: it amplified readily, suffered 
the least from multiple banding patterns when the 
PCR product was not gel-purified, and often yielded 
readable sequences 300-350 bp from the internal 
primers. 


QUALITY CONTROL OF SEQUENCE DATA 


Sequence comparisons for the genes atpB, 185, 
and rbcL included 1468, 1671, and 1397 bp, 
respectively (Table 1). Both strands of DNA were 
sequenced for both atpB and rbcL with approxi- 
mately 80% overlap. Both strands were also se- 
quenced for 185, but with much less overlap be- 
tween the two directions (30-40%). The sequences 
were read from the autoradiographs, recorded on 
a data sheet, entered into MacClade (Maddison & 
Maddison, 1992), then printed and rechecked from 
the autoradiographs for errors. 

Alignment problems (caused by compressions) 
for atpB and rbcL often could be rectified by 
reading the opposite strand. Within the atpB gene, 
the following regions were susceptible to compres- 
sions: positions 47—53, 875-879, апа 1455-1457. 
There were several regions in the 18S nrDNA 
sequences where alignment was impossible because 
of compressions or base insertion/deletion events. 
These regions were deleted from the data matrix 
and are located at the following positions in relation 
to the soybean 18S sequence (Eckenrode et al., 
1985): 224-231, 667-670, 710, 738, 1174- 
1175, 1366, and the very end of the amplified 
region, 1711—1761. Sequence divergence values 
(described below) calculated between pairs of se- 
quences excluded these problematic regions or po- 
sitions. The possibility of PCR-generated anoma- 
lous sequences was checked by comparison of 
sequences from closely related taxa. Sequences 
furnished by other labs were also checked for in- 
consistencies both by comparison with other closely 
related taxa and occasional duplicate sampling of 
the same genus (e.g., the rbcL and 185 sequences 
for Akebia, and the rbcL sequence for Dicentra). 


DATA ANALYSIS 


Phylogenetic analyses were performed using 
PAUP 3.1 (Swofford, 1993) using the branch-and- 
bound search option (with collapse of zero-length 
branches) to assure recovery of the most parsi- 
monious trees. PAUP was also used to perform 
bootstrap analysis with 1000 replications using the 
branch-and-bound search option (Felsenstein, 
1985). The decay indices (the number of steps that 
must be added to the minimal-length tree before a 
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TABLE З. Comparison of data sets from atpB, 185 nrDNA, rbcL, and a combination of all data. Numbers in 


parentheses indicate number of informative three-state characters (excluding those where two of the three states were 
autapomorphies). The % Ts = unambiguous transitions/unambiguous changes х 100 and was calculated for each 
gene from one of the most parsimonious trees. g, is a measure of the skewness of the distribution of 100,000 randomly 
generated trees. Tree length (TL) was calculated including uninformative characters; consistency index (CI) and 
rescaled consistency index (RC) were calculated excluding uninformative characters. RI = retention index. 











4- 
3- state Num- 
Vari- Infor- Binary state char- ber 
able mative char- char- act- of 
Gene sites sites acters acters ers %Ts gı trees TL CI RI RC 

atpB 143 42 36 11(6) 0 69 -1.36 3 172 0.73 0.74 0.54 
18S 76 25 21 6 (4) 0 66 -1.19 3 88 0.81 0.85 0.69 
rbcL 172 56 47 21(9) > 56 —0.59 1 225 064 068 0.44 
Combined data 391 123 104 38(19) 0 61 —0.82 2 493 067 0.70 0.47 





clade collapses) were computed for all trees using 
the heuristic search option (Donoghue et al., 1992). 
A tree length distribution of 100,000 randomly 
sampled trees was generated for each of the atpB, 
18S, and rbcL data matrices using the “random 
trees” selection of PAUP. This distribution was 
analyzed for skewness as an estimate of nonrandom 
structure in the sequences (Hillis & Huelsenbeck, 
1992). Sequence divergence values were computed 
as the proportion of divergent sites from direct 
pairwise comparisons of the sequence data. 

There has been much discussion about the ad- 
vantages and disadvantages of analyzing indepen- 
dent data sets separately, combining independent 
data sets prior to phylogenetic analyses, or ana- 
lyzing them separately and then applying consensus 
methods (see Bull et al., 1993, for a review). There 
is clearly a strong argument for combining data 
sets, especially in cases where there is substantial 
homoplasy and the phylogenetic signal in a partic- 
ular data set is insufficient to resolve certain 
branching patterns in a tree (Kluge, 1989; Barrett 
et al., 1991; Olmstead & Sweere, 1994). For this 
reason, we have chosen to analyze a combination 
of all the data sets (atpB, 18S, and rbcL) as well 
as each data set separately. 

Alternative tree topologies and resultant changes 
in tree length were explored using MacClade 3.0 
(Maddison & Maddison, 1992). MacClade was also 
used to calculate character transformations of var- 
ious types for each sequence tree (for example, 
transition/transversion bias and the number of 
changes at different codon positions). 

Outgroup taxa for the Lardizabalaceae in all 
analyses were selected based on the results of sev- 
eral previous phylogenetic analyses of the Ran- 
unculidae (sensu Takhtajan, 1987). A cladistic 


analysis based on traditional data placed Berberi- 
daceae, Menispermaceae, and the Ranunculineae 
(Ranunculaceae plus Papaveraceae) as potential 
sister taxa to the Lardizabalaceae (Loconte & Ste- 
venson, 1991). However, preliminary analyses of 
atpB and rbcL data (analyzed as separate and 
combined data sets) with extensive sampling of the 
ranunculids, consistently place the Ranunculaceae 
as the most derived family of the Ranunculidae. 
Representatives from two families, Menisperma- 
ceae and Berberidaceae, are the basal members of 
a large clade that is resolved as the sister group 
to the Lardizabalaceae (Hoot & Crane, work in 
progress). The Papaverales are resolved as rela- 
tively basal to the Lardizabalaceae and other ran- 
unculids (Chase et al., 1993; Hoot & Crane, work 
in progress). Therefore, in this paper Dicentra 
eximia (Fumariaceae, Papaverales) was used to 
root the phylogenetic analyses with Tinospora 
(Menispermaceae) and Nandina (Berberidaceae) 
included as additional outgroup taxa (Table 1). 


RESULTS 
PHYLOGENETIC ANALYSES 


Table 3 lists the number of variable positions, 
informative characters (after removal of autapo- 
morphies), and binary, three- and four-state char- 
acters for each gene. 

Analysis based on the atpB data resulted in three 
equally parsimonious trees; Stauntonia, Akebia, 
and Holboellia are unresolved due to lack of vari- 
able sites. These trees were based on 143 variable 
sites (Table 3; 42 informative characters) with a 
tree length (TL) = 172, a consistency index ex- 
cluding autapomorphies (CI) = 0.73 (Kluge & 
Farris, 1969), and a retention index (RI) = 0.74 


Моште 82, Митбег 2 
1995 


Hoot et al. 201 


atpB Gene Sequences 





(Farris, 1989). The strict consensus cladogram de- 
rived from the three trees is presented in Figure 
2. The monophyly of the Lardizabalaceae is well 
supported, with 19 base substitutions, a bootstrap 
value of 100%, and a decay index of 11. Sinof- 
ranchetia and Decaisnea are basal within the fam- 
ily. The remaining five genera form two clades, 
each of which is also supported by morphological 
characters (Hoot, Culham & Crane, work in prog- 
ress). The clade consisting of the two South Amer- 
ican genera, Boquila and Lardizabala, is only 
weakly supported (one base substitution, a boot- 
strap value of 64%, and a decay index of one) in 
contrast to stronger support for the clade com- 
prising the Asian genera, Stauntonia, Akebia, and 
Holboellia (four base substitutions, a bootstrap val- 
ue of 94%, and a decay index of three). 

The 18S nrDNA data matrix, consisting of 76 
variable sites (25 informative sites), resulted in 
three most parsimonious trees with a TL = 88, CI 
(excluding autapomorphies) = 0.81, and RI = 0.85. 
One of the shortest trees is presented in Figure 2 
(the branch that collapses in the strict consensus 
tree is indicated with dotted lines). As in the atpB 
tree, the monophyly of the Lardizabalaceae with 
respect to the three outgroups is well supported 
with nine nucleotide changes, a bootstrap value of 
99%, and a decay index of seven. The 185 tree 
is congruent with the atpB tree in other respects 
as well: Sinofranchetia and Decaisnea are basal 
within the family and the clade consisting of Lar- 
dizabala, Boquila, Akebia, Stauntonia, and Hol- 
boellia is recognized but with less internal reso- 
lution. 

The rbcL data consisting of 172 variable sites 
(56 informative sites) resulted in one fully resolved 
most parsimonious tree (Fig. 2). The TL = 225, 
CI (excluding autapomorphies) = 0.64, and RI = 
0.68. However, unlike the results from atpB and 
18S, the rbcL sequence data do not support the 
monophyly of the Lardizabalaceae, placing Sin- 
ofranchetia with the outgroup Dicentra in a basal 
position with respect to all other genera. The rbcL 
tree is congruent with the atpB tree in placing 
Decaisnea as the sister genus to the clade con- 
sisting of Lardizabala, Boquila, Akebia, Staun- 
tonia, and Holboellia, although the pattern of re- 
lationships among these five genera is different (the 
clade containing Lardizabala and Boquila is not 
recognized). 

A further analysis performed using a combina- 
tion of all three data sets (atpB, rbcL, and 185 
nrDNA) resulted in two trees derived from 391 
variable sites (123 informative sites) with a TL = 


493, CI (excluding autapomorphies) = 0.67, and 
RI = 0.70. The clade that collapses in a strict 
consensus tree of the two most parsimonious trees 
is shown with dotted lines (Fig. 2). Excluding re- 
lationships among outgroups, the consensus tree 
based on all data is most similar to the tree based 
on atpB data alone. The monophyly of the Lar- 
dizabalaceae is again strongly supported with 37 
nucleotide changes, a bootstrap value of 100%, 
and a decay index of 14. The combined data also 
support the clades consisting of (Lardizabala, Bo- 
quila, (Akebia, Stauntonia, and Holboellia,)) but 
with higher bootstrap values and decay indices (Fig. 
2). 


CHARACTER TRANSFORMATIONS 


The number of substitutions inferred for each 
nucleotide position was calculated over the trees 
derived from the three sequence data sets using 
MacClade 3.0 (Maddison & Maddison, 1992). For 
both atpB and rbcL, substitutions occur fairly uni- 
formly across the gene (Fig. 3). Even considering 
the exclusion of some nucleotide positions due to 
compressions or insertion/deletion events (see 
above), the nucleotide substitutions are much less 
evenly distributed across the 18S gene. This is 
probably due to constraints imposed by the sec- 
ondary structure of 185 nrDNA (Gutell & Woese, 
1990; Nickrent & Sargent, 1991; Dixon & Hillis, 
1993). There is variation in the number of steps/ 
site found for each gene, with a high of four steps 
in atpB to seven in rbcL. 

Most of the unambiguous changes (142 of 172 
changes or 83%) in the most parsimonious atpB 
trees occur at third-position sites contrasted with 
15 unambiguous changes each at first and second 
positions. A similar pattern is found with the rbcL 
tree: 41 changes in the first-position, 21 in second- 
position, and 167 in third-position sites (73%). The 
CIs (including autapomorphies) for the various po- 
sitions are high for both genes, even at third-po- 
sition sites: 1.0 (atpB) and 0.73 (rbcL) for first 
positions, 0.93 and 0.90 for second positions, and 
0.88 and 0.86 for third positions. 

There are approximately twice as many tran- 
sitions (94—95) as transversions (42-43) when cal- 
culated over the three most parsimonious atpB 
trees. A similar proportion of transitions to trans- 
versions occurs with the 18S data, 51—52 tran- 
sitions to 25—26 transversions. However, the pro- 
portion is more even when calculated over the most 
parsimonious rbcL tree, 64 transitions to 51 trans- 
versions. 
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Ficure 2. Most parsimonious phylogenetic trees resulting from atpB, 185 nrDNA, rbcL, and a combination of 
all three sequence data sets. Numerals above branches indicate the number of nucleotide changes supporting each 
branch. Numerals below in parentheses indicate the percentage of times that the branch was recovered in 1000 
bootstrap replications. Numbers below and to the right of the bootstrap values are decay indices, indicating how many 
additional steps are necessary before the branch collapses. For 18S and a combination of all three data sets, only 
one of several equally parsimonious topologies (with appropriate values) is illustrated. Dotted lines in the trees based 
on 18S and a combination of all three data sets indicate where branches collapse in the strict consensus trees derived 


from multiple most parsimonious trees. 
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COMPARISON OF SEQUENCE DIVERGENCE AND 
PHYLOGENETIC SIGNAL 


Table 4 presents the sequence divergence values 
for atpB, 18S, and rbcL gene sequences from 
selected pairs of genera chosen to cover the entire 
range of divergence found within this study. The 
highest divergence values are found in pairwise 
comparisons with the outgroup genus Dicentra. 
Comparing divergence between the three genes, 
the highest values are found in the rbcL sequences 
(5.2%), followed by atpB (4.8%) and 185 (2.5%). 
The range narrows in pairwise comparisons be- 
tween more closely related genera, with no sub- 
stantial difference in the values found in the pair- 
wise comparison of Stauntonia and Holboellia for 
the three genes (0.2-0.6%). Divergence values 
were calculated simply as a proportion of divergent 
sites in each sequence comparison with no provision 
made to account for superimposed events (multiple 
hits) which must have occurred at many positions. 
Using this algorithm, once an initial substitution 
has occurred at a position, subsequent changes at 
that same position cannot increase the divergence, 
but divergence can be decreased by converting the 
novel nucleotide back to the original condition (par- 
allelism or reversal). For this reason, divergence 
values do not increase uniformly with the number 
of substitution events, but instead increase rapidly 
at first and more slowly thereafter (Swofford & 
Olsen, 1990). A graph of sequence divergence 
values resulting from pairwise comparisons of 
Stauntonia with four genera is shown in Figure 4. 
While the divergence values for atpB and 18S 
increase with more distantly related taxa, there is 
a noticeable flattening of the curve with rbcL (Fig. 
4, Table 4). This suggests that, at greater taxo- 
nomic distances, superimposed events caused by 
more frequent substitutions are more of a factor 
in this rbcL data set and that some of the estimates 
of sequence divergence for this gene are probably 
artificially low. 

The distribution of the lengths of 100,000 ran- 
domly generated trees for the atpB sequence data 
reflects considerable nonrandom structure (Fig. 5). 
The skewness of this distribution, measured by a 
g, value of — 1 .36, far exceeds the P = 0.01 critical 
value for data sets of this size (critical values of g, 
for four-state characters, 10 taxa, and 100/250 


<— 


FicureE 3. Histograms showing distribution of nucle- 
otide changes and the number of steps/site or character 
as calculated from the most parsimonious phylogenetic 
trees derived from atpB, 185 nrDNA, and rbcL sequence 
data. 
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sequences. Divergence values in the upper right half of each matrix are the proportion of divergent sites in each 
comparison. Actual number of divergent sites calculated from the original data matrix appears in the lower left half 
of each matrix. Sequences were not compared at positions with missing or ambiguous states. 


atpB | 2 

1. Dicentra 0 0.044 
2. Sinofranchetia 62 0 
3. Lardizabala 64 14 
4. Holboellia 68 17 
5. Stauntonia 67 15 
185 rDNA | о 

1. Dicentra 0 0.022 
2. Sinofranchetia 35 0 
3. Lardizabala 32 12 
4. Holboellia 41 20 
5. Stauntonia 36 14 
rbcL | 2 

1. Dicentra 0 0.052 
2. Sinofranchetia 67 0 
3. Lardizabala 64 60 
4. Holboellia 65 54 
5. Stauntonia 68 56 


variable characters were —0.33 and —0.27 re- 
spectively, Hillis & Huelsenbeck, 1992). The g, 
values for 18S nrDNA and rbcL (—1.19 and 
—0.59, Table 3) were lower but also significant. 


DISCUSSION 


The trees derived from all three genes and the 
combined data give broadly similar phylogenetic 
results. All analyses support the recognition of a 
clade comprising Akebia, Stauntonia, and Hol- 
boellia within a broader clade that includes Lar- 
dizabala and Boquila. An important morpholog- 
ical feature that is diagnostic of this inclusive clade 
is laminar placentation. The two Lardizabalaceae 
with submarginal placentation (Decaisnea and Sin- 
ofranchetia) are placed external to this group. The 
relatively basal position of Decaisnea and Sinof- 
ranchetia is consistent with the results of prior 
phylogenetic analyses based on traditional data and 
classification schemes (Loconte & Estes, 1989; 
Qin, 1989). There is also weak support (from atpB, 
18S, and combined data) for the two South Amer- 
ican taxa (Boquila and Lardizabala) as sister gen- 
era. 

The efficacy of the atpB gene for phylogenetic 
reconstruction is well supported by the number of 
variable sites and the relatively high consistency 
and retention indices in our atpB analysis (Table 


3 4 5 
0.045 0.048 0.046 
0.010 0.012 0.011 
0 0.006 0.005 
9 0 0.003 
Ў) 4 0 

3 4 5 
0.020 0.025 0.022 
0.007 0.012 0.009 
0 0.007 0.004 

12 0 0.006 
7 9 0 

3 4 5 
0.046 0.047 0.049 
0.046 0.042 0.043 
0 0.006 0.008 
9 0 0.002 

12 3 0 


3). However, another measure of the accuracy of 
the emergent phylogenies is provided by comparing 
the atpB tree with trees derived from other data 
sets. The atpB tree is largely congruent with the 
trees derived from 185 nrDNA and with a tree 
derived from a combination of all three data ma- 
trices (Fig. 2). However, unlike the trees resulting 
from atpB and 18S nrDNA alone, and a combi- 
nation of all the molecular data, the rbcL tree does 
not recognize the monophyly of the Lardizabala- 
ceae nor does it recognize the Boquila and Lar- 
dizabala clade. All possible pairwise combinations 
of the atpB, 185, and rbcL data sets also support 
the monophyly of the Lardizabalaceae, although 
only the combination of atpB and 185 gave a single 
tree that supported the sister relationship of Bo- 
quila and Lardizabala. Results from the rbcL data 
are therefore somewhat incongruent with those from 
other genes, various combined data sets, and also 
morphological studies. Since Decaisne’s work on 
the family (1837-1838, 1839), the Lardizabala- 
ceae have been considered a natural family (Prantl, 
1891; Taylor, 1967; Hutchinson, 1973; Cron- 
quist, 1981; Takhtajan, 1987; Qin, 1989), and 
this is supported by a previous cladistic analysis of 
the Ranunculales (Loconte & Estes, 1989), as well 
as preliminary analyses of the family based on 
morphology (Hoot, Culham & Crane, work in prog- 


ress). There are two unambiguous characters within 
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Ho/St Si/St Di/St 


FIGURE 4. Graph of divergence values in pairwise 
comparisons of selected genera with Stauntonia. Ho = 
Holboellia, St = Stauntonia, La = Lardizabala, Si 
Sinofranchetia, and Di = Dicentra. 


La/St 


the rbcL data set that support the monophyly of 
the Lardizabalaceae but they are outweighed by 
eight unambiguous characters supporting the in- 
clusion of Sinofranchetia with the outgroup, Di- 
centra. Combining the rbcL data set with either 
the atpB or the 18S rDNA data set is sufficient 
to counteract the rbcL characters supporting poly- 
phyly, resulting in trees recognizing a monophyletic 
Lardizabalaceae. 

The statistics and indices of support for partic- 
ular tree topologies and branching patterns also 
favor the monophyletic status of the Lardizabala- 
ceae, as opposed to the separation of Sinofran- 
chetia favored by the rbcL tree. Table 3 presents 
the various CIs, RIs, and rescaled consistency in- 
dices (RC; Farris, 1989). By any of these measures, 
trees supporting the family’s monophyly exhibit 
less homoplasy than is found with the rbcL tree. 
The number of nucleotide changes, the bootstrap 
values (99–100%), and decay indices (7-14) for 
the branch supporting the monophyly of the family 
are extremely high in the trees with this topology. 
Furthermore, moving Sinofranchetia basal to all 
the rest of the Lardizabalaceae as found in the 
atpB, 183, and combined-data trees adds only four 
steps (1.7% of the tree length excluding noninfor- 
mative characters) to the rbcL tree length. In con- 
trast, despite the smaller number of informative 
sites, moving Sinofranchetia to any outgroup po- 
sition costs a minimum of 12 steps (6.9% of TL) 
in the atpB tree, seven steps (7.9% of TL) in the 
185 tree, and 15 steps in the combined data tree 
(3.0% of TL). 

Several experiments were conducted on the rbcL 
data set to test how firmly Sinofranchetia was 
separated from other genera of Lardizabalaceae. 
There were two potentially informative characters 
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СОВЕ 5. Histogram showing skewness (g, = — 1.36) 
of tree lengths for 100,000 randomly generated trees 
derived from atpB sequence data. 


(at tobacco positions 1345 and 1380) missing from 
the Sinofranchetia rbcL sequence due to ampli- 
fication problems (see Materials and Methods). To 
test whether these missing data were responsible 
for the reduced support for Lardizabalaceae mono- 
phyly, we inserted the character states found in 
other Lardizabalaceae. This produced a single most 
parsimonious tree with exactly the same topology 
as the rbcL tree (Fig. 2). Assuming that positions 
with more than two states may be particularly 
subject to mutation, a further analysis was per- 
formed in which all positions with three or more 
character states were omitted from the rbcL data 
set. This also resulted in a tree with the same 
topology as was found with the full rbcL data set. 

The lack of support for Lardizabalaceae mono- 
phyly тау be, at least in part, a “‘long-branch” 
attraction effect between Dicentra and Sinofran- 
chetia, caused by the relatively higher rate of 
substitutions in rbcL compared with the other two 
sequences. The higher number of variable char- 
acters, the larger number of positions with three- 
state characters, less left-handed skewness in the 
data set (а, = —0.59; Table 3), and the higher 
sequence divergence values between matched pairs 
of taxa (Table 4) indicate a higher rate of nucleotide 
substitutions. Removal of third-position sites results 
in the exclusion of both Decaisnea and Sinofran- 
chetia from the Lardizabalaceae (a similar exper- 
iment with the atpB data yields a consensus tree 
less resolved but congruent with that found in Fig. 
2). This suggests that substitution rates within the 
rbcL gene are at a level where saturation is just 
as likely to occur at some first and second positions 
as third positions. The homoplasy values calculated 
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for first-, second-, and third-position sites (0.27, 
0.10, and 0.14, respectively) in this and other 
phylogenetic studies using rbcL sequences support 
this conclusion (Donoghue et al., 1992; Kim et al., 
1992; Chase et al., 1993). 

All possible configurations of outgroup taxa (i.e., 
only Nandina, Tinospora, or Dicentra included, 
or each removed, leaving two outgroup represen- 
tatives) were tested for their effect on the resolution 
of the rbcL data. Only the removal of Dicentra 
resulted in a monophyletic Lardizabalaceae. Fre- 
quently, increased sampling will correct for long- 
branch problems caused by widely divergent taxa 
(Donoghue et al., 1992; Olmstead et al., 1992; 
Chase et al., 1993; Qiu et al., 1993). However, 
in preliminary analyses of the ranunculids (includ- 
ing a number of Papaverales and the two genera 
comprising the Circaeasteraceae) based on rbcL 
data, the increased sampling does not move Sin- 
ofranchetia into the Lardizabalaceae (similar anal- 
yses of atpB data continue to recognize a mono- 
phyletic Lardizabalaceae). Work in progress will 
focus on the addition of the potential outgroup 
genus, Sargentodoxa, as well as increased sam- 
pling in the vicinity of Dicentra (i.e., Fumariaceae, 
Hypecoaceae, and other Papaverales). 

For the three data sets, all comparative indi- 
cators of evolutionary rate, such as the number of 
variable sites, informative characters, and three- 
state characters, g, values, and indices measuring 
homoplasy, indicate that 18S nrDNA is the most 
conserved, atpB is intermediate, and rbcL is the 
least conserved of the three genes (Table 3). Se- 
quence divergence values between matched pairs 
of taxa also suggest an intermediate rate of nucle- 
otide substitutions for atpB (Table 4), and this is 
consistent with preliminary results for other ran- 
unculid and (lower) hamamelid families that we 
have examined (Hoot & Crane, unpublished data). 
We conclude that while atpB, 18S nrDNA, and 
rbcL sequences are all useful for phylogenetic re- 
construction at higher taxonomic levels, there are 
substantial differences in the degree of conservation 
of nucleotides among the three genes. In large- 
scale surveys of divergent taxa, a combination of 
several sequence data sets seems likely to provide 
the best possibility of resolving both proximal and 
distal branching patterns. 
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